Source: https://www.rstudio.com/

Source: https://www.rstudio.com/

Slides

Here are the introduction slides for this practical on Plotting 1.0: ggplot!

Overview

In this practical you’ll practice plotting data with the ggplot2 package.

Cheatsheet

If you don’t have it already, you can access the ggplot2 cheatsheet here https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf. This has a nice overview of all the major functions in ggplot2.

Examples

# -----------------------------------------------
# Examples of using ggplot2 on the mpg data
# ------------------------------------------------

library(tidyverse)         # Load tidyverse (which contains ggplot2!)

mpg # Look at the mpg data

# Just a blank space without any aesthetic mappings
ggplot(data = mpg)

# Set the overall plotting theme
theme_set(theme_bw())   # theme_bw(), theme_minimal(), theme_classic()

# Now add a mapping where engine displacement (displ) and highway miles per gallon (hwy) are mapped to the x and y aesthetics
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy))   # Map displ to x-axis and hwy to y-axis

#  Add points with geom_point()
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point()     

#  Add points with geom_count()
ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_count()   

# Again, but with some additional arguments

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy)) +
       geom_point(col = "red",                  # Red points
                  size = 3,                     # Larger size
                  alpha = .5,                   # Transparent points
                  position = "jitter") +        # Jitter the points         
         scale_x_continuous(limits = c(1, 15)) +  # Axis limits
         scale_y_continuous(limits = c(0, 50))


# Assign class to the color aesthetic and add labels with labs()

ggplot(data = mpg, 
  mapping = aes(x = displ, y = hwy, col = class)) +  # Change color based on class column
  geom_point(size = 3, position = 'jitter') +
  labs(x = "Engine Displacement in Liters",
       y = "Highway miles per gallon",
       title = "MPG data",
       subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
       caption = "Source: mpg data in ggplot2")
  

# Add a regression line for each class

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point(size = 3, alpha = .9) + 
  geom_smooth(method = "lm")

# Add a regression line for all classes

ggplot(data = mpg, 
       mapping = aes(x = displ, y = hwy, color = class)) +
  geom_point(size = 3, alpha = .9) + 
  geom_smooth(col = "blue", method = "lm")


# Facet by class
ggplot(data = mpg,
       mapping = aes(x = displ, 
                     y = hwy, 
                     color = factor(cyl))) + 
  geom_point() +
  facet_wrap(~ class) 


# Another fancier example

ggplot(data = mpg, 
       mapping = aes(x = cty, y = hwy)) + 
       geom_count(aes(color = manufacturer)) +     # Add count geom (see ?geom_count)
       geom_smooth() +                   # smoothed line without confidence interval
       geom_text(data = filter(mpg, cty > 25), 
                 aes(x = cty,y = hwy, 
                     label = rownames(filter(mpg, cty > 25))),
                     position = position_nudge(y = -1), 
                                check_overlap = TRUE, 
                     size = 5) + 
       labs(x = "City miles per gallon", 
            y = "Highway miles per gallon",
            title = "City and Highway miles per gallon", 
            subtitle = "Numbers indicate cars with highway mpg > 25",
            caption = "Source: mpg data in ggplot2",
            color = "Manufacturer", 
            size = "Counts")

Tasks

Getting the data and project setup

  1. For this practical we’ll play around with a few different datasets that are contained in different packages. The datasets, and the packages that contain them, are listed below. If you don’t have any of these packages already, make sure to install them!
Dataset Package
ACTG175 speff2trial
diamonds ggplot2
Davis car
heartdisease FFTrees
  1. Load the tidyverse package.

Building a plot step-by-step

  1. The diamonds dataset in the ggplot2 package shows information about 50,000 round cut diamonds. Print the diamonds dataset, it should look like this:
diamonds
# A tibble: 53,940 x 10
   carat       cut color clarity depth table price     x     y     z
   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
 7  0.24 Very Good     I    VVS1  62.3    57   336  3.95  3.98  2.47
 8  0.26 Very Good     H     SI1  61.9    55   337  4.07  4.11  2.53
 9  0.22      Fair     E     VS2  65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good     H     VS1  59.4    61   338  4.00  4.05  2.39
# ... with 53,930 more rows
  1. Create the following blank plot

  1. Now add points showing the relationship between the number of carats in the diamonds (carat) and its price (price)

  1. Make the points transparent using the alpha argument to geom_point()

  1. Color the points by their cut

  1. Create different plots for each value of cut using the facet_wrap function:

  1. Add a black, smoothed mean line to each plot using geom_smooth() (You can also try turning the line into a regression line using the method argument)

Playing with themes

  1. Look at the theme help menu with ?theme_bw() to see a list of all of the standard ggplot themes. Then, using the theme_set() function, try setting your global theme to different themes. When you do, evaluate your previous plotting code again to see the new themes in action!

  2. The ggthemes package contains many additional themes. If you don’t have the package already, install it. Then, look at the ggthemes() vignette by running the following code:

# Open the ggthemes vignette
vignette("ggthemes", package = "ggthemes")
  1. Now, create the following plot from the mpg data using the using the Five Thirty Eight theme

Density geom with geom_density()

  1. Create the following density plot of prices from the diamonds data using the following template:
  • Set the data argument to diamonds
  • Map carat to the x aesthetic
  • Add a density geom with geom_density() and set the fill to "tomato1"
  • Add labels
  • Use the minimal theme with theme_minimal()
ggplot(data = XX, 
       mapping = aes(x = XX)) + 
       geom_density(fill = "XX") + 
       labs(x = "XX", 
            y = "XX", 
            title = "XX",
            subtitle = "XX",
            caption = "XX")

Boxplot geom geom_boxplot()

  1. Look at the help menu for geom_boxplot(). Then, create the following boxplot using the following template
ggplot(data = XX,
  mapping = aes(x = XX, y = log(XX), fill = XX)) + 
  geom_boxplot()  + 
  labs(y = "XX", 
       x = "XX", 
       color = "XX",
       title = "XX",
       subtitle = "XX") +
  scale_fill_brewer(palette = "XX")

Violin geom geom_violin()

  1. Now make the following plot using geom_violin(). You can also change the color palette in the palette argument to the scale_fill_brewer() function. Look at the help menu with ?scale_fill_brewer() to see all the possibilities. In the plot below, I’m using "Set1"

Summary statistics

  1. You can use the stat_summary() function to add summary statistics as geoms to plots. Using the following template, create the following plot showing the mean prices of diamonds for each level of clarity.
ggplot(data = XX,
  mapping = aes(x = XX, y = XX)) + 
stat_summary(fun.y = "mean", 
             geom = "bar", 
             fill = "white", 
             col = "black") +
  labs(y = "XX", 
       x = "XX", 
       color = "XX", 
       title = "XX", 
       caption = "XX")

  1. Now, create the following plot from the mpg dataframe

  1. You can easily flip the coordinates of a plot by using coord_flip(). Using coord_flip(), flip the x and y coordinates of your previous plot so it looks like this:

Saving plots as objects

  1. Create the following plot from the mpg dataset, and save it as an object called myplot

  1. Now, using object assignment <- add a regression line to the myplot object with geom_smooth(). Then evaluate the object to see the updated version. It should now look like this:

  1. Using ggsave(), save the object as a pdf file called myplot.pdf. Set the width to 6 inches, and the height to 4 inches. Open the pdf outside of RStudio to make sure it worked!

Demographic information of midwest counties in the US

  1. Print the midwest dataset and look at the help menu to see what values it contains. It should look like this:
# A tibble: 437 x 28
     PID    county state  area poptotal popdensity popwhite popblack
   <int>     <chr> <chr> <dbl>    <int>      <dbl>    <int>    <int>
 1   561     ADAMS    IL 0.052    66090  1270.9615    63917     1702
 2   562 ALEXANDER    IL 0.014    10626   759.0000     7054     3496
 3   563      BOND    IL 0.022    14991   681.4091    14477      429
 4   564     BOONE    IL 0.017    30806  1812.1176    29344      127
 5   565     BROWN    IL 0.018     5836   324.2222     5264      547
 6   566    BUREAU    IL 0.050    35688   713.7600    35157       50
 7   567   CALHOUN    IL 0.017     5322   313.0588     5298        1
 8   568   CARROLL    IL 0.027    16805   622.4074    16519      111
 9   569      CASS    IL 0.024    13437   559.8750    13384       16
10   570 CHAMPAIGN    IL 0.058   173025  2983.1897   146506    16559
# ... with 427 more rows, and 20 more variables: popamerindian <int>,
#   popasian <int>, popother <int>, percwhite <dbl>, percblack <dbl>,
#   percamerindan <dbl>, percasian <dbl>, percother <dbl>,
#   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
#   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
#   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, category <chr>
  1. Using the following code as a template, create the following plot showing the relationship between college education and poverty
ggplot(data = XX, 
    mapping = aes(x = XX, y = XX)) + 
    geom_point(aes(fill = XX, size = XX), shape = 21, color = "white") + 
    geom_smooth(aes(x = XX, y = XX)) +
    labs(
        x = "XX", 
        y = "XX", 
        title = "XX",
        subtitle = "XX",
        caption = "XX") + 
    scale_color_brewer(palette = "XX") + 
    scale_size(range = c(XX, XX)) +
    guides(size = guide_legend(override.aes = list(col = "black")), 
           fill = guide_legend(override.aes = list(size = 5)))

  1. Create the following density plot showing the density of inhabitants with a college education in different states using the following template
ggplot(data = XX, 
       mapping = aes(XX, fill = XX)) + 
  geom_density(alpha = XX) + 
  labs(title = "XX", 
       subtitle = "XX",
       caption = "XX",
       x = "XX",
       y = "XX",
       fill = "XX")

Heatplots with geom_tile()

  1. You can create heatplots using the geom_tile() function. Try creating the following heatplot of statistics of NBA players using the following template:
# Read in nba data
nba_long <- read.csv("https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_data/nba_long.csv")

ggplot(XX, 
       mapping = aes(x = XX, y = XX, fill = XX)) + 
  geom_tile(colour = "XX") + 
  scale_fill_gradientn(colors = c("XX", "XX", "XX"))+ 
  labs(x = "XX", 
       y = "XX", 
       fill = "XX", 
       title = "NBA XX performance",
       subtitle = "XX",
       caption = "XX") +
  coord_flip()

  1. Make the following plot of savings data (psavert) from the economics dataset.

  1. Make the following plot from the ACTG175 dataset. To do this, you’ll need to use both geom_boxplot() and geom_point(). To jitter the points, use the position argument to geom_point(), as well as the position_jitter() function to control how much to jitter the points.

  1. Create the following lolipop chart from the Midwest data.
midwest_IL <- midwest %>% 
  filter(state == "XX") %>%
  mutate(popdensity_z = (popdensity - mean(popdensity)) / sd(popdensity)) %>%
  arrange(desc(popdensity_z)) %>%
  mutate(county = factor(county, levels = county)) %>%
  slice(1:25)

ggplot(XX, aes(x = XX, y = XX)) + 
  geom_segment(aes(y = 0, 
                   x = county, 
                   yend = popdensity_z, 
                   xend = county, 
                   col = popdensity_z), size = XX) +
  geom_point(size = XX, fill = "white", shape = 21)  +
  labs(title = "XX", 
       subtitle = "XX",
       Y = "XX",
       X = "XX") + 
  ylim(XX, XX) +
  scale_colour_gradient(low = "XX", high = "XX", limits = c(-.1, 9)) +
  coord_flip() +
  geom_text(aes(label = 1:25)) +
  guides(col = FALSE) +
  theme_XX() +
  theme(panel.grid = element_blank())

  1. The code to create mpg_agg, a dataframe containing aggregated data from the mpg dataframe, is below. Once you’ve created mpg_agg, create the following heat plot using geom_tile()
# Calculate mean highway miles per gallon for each combination of
#  manufacturer and class

mpg_agg <- mpg %>%
  group_by(manufacturer, class) %>%
  summarise(
    hwy_mean = mean(hwy)
  )

References